-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] adding pivot.max_search_page_size option for setting paging size #41920
[ML] adding pivot.max_search_page_size option for setting paging size #41920
Conversation
Pinging @elastic/ml-core |
1def6de
to
8cbb8e7
Compare
8cbb8e7
to
e9944c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few points to discuss.
this.groups = ExceptionsHelper.requireNonNull(groups, DataFrameField.GROUP_BY.getPreferredName()); | ||
this.aggregationConfig = ExceptionsHelper.requireNonNull(aggregationConfig, DataFrameField.AGGREGATIONS.getPreferredName()); | ||
this.size = size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could require a non-null value for this. Then from the REST action we set it to the default value if the user did not specify it. This way the default is not hidden and the user becomes aware of the parameter. We tend to follow this approach unless the default value is dynamically calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the current approach as I would not promote this setting like query. My opinion: should be an expert setting.
@@ -27,6 +27,7 @@ | |||
public static final ParseField SOURCE = new ParseField("source"); | |||
public static final ParseField DESTINATION = new ParseField("dest"); | |||
public static final ParseField FORCE = new ParseField("force"); | |||
public static final ParseField SIZE = new ParseField("size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know size
is what the search expects and in the context of a search it is a size. However, in datafeeds and now data frame transforms, this is better described as the scroll_size
or page_size
. I'm not feeling too strongly about this but it's worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, rollup uses page_size
, I find size
to unspecific
@@ -29,6 +30,7 @@ | |||
private static final String NAME = "data_frame_transform_pivot"; | |||
private final GroupConfig groups; | |||
private final AggregationConfig aggregationConfig; | |||
private final Integer size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On having size
in pivot instead of the transform object itself:
I think the decision on this depends on whether we imagine that a transform will always use a composite aggregation. If yes, then size
should be on the top level.
Looks good, we should decide on the naming. For me this is an expert setting, so I would not promote it. API level I find it confusing from the user perspective: the usage of composite aggs is an implementation detail and a parameter "size" isn't descriptive enough. It would be better to give it a more descriptive, explicit name: About placement: If it's just 1 setting it's ok to put it into My proposal, a top-level
The parameters might not apply to the function used in the transform, but I think that's simpler than having a |
run elasticsearch-ci/1 |
run elasticsearch-ci/1 |
1 similar comment
run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…elastic#41920) * [ML] adding pivot.size option for setting paging size * Changing field name to address PR comments * fixing ctor usage * adjust hlrc for field name change
* elastic/master: (84 commits) [ML] adds geo_centroid aggregation support to data frames (elastic#42088) Add documentation for calendar/fixed intervals (elastic#41919) Remove global checkpoint assertion in peer recovery (elastic#41987) Don't create tempdir for cli scripts (elastic#41913) Fix debian-8 update (elastic#42056) Cleanup plugin bin directories (elastic#41907) Prevent order being lost for _nodes API filters (elastic#42045) Change IndexAnalyzers default analyzer access (elastic#42011) Remove reference to fs.data.spins in docs Mute failing AsyncTwoPhaseIndexerTests Remove close method in PageCacheRecycler/Recycler (elastic#41917) [ML] adding pivot.max_search_page_size option for setting paging size (elastic#41920) Docs: Tweak list formatting Simplify handling of keyword field normalizers (elastic#42002) [ML] properly nesting objects in document source (elastic#41901) Remove extra `ms` from log message (elastic#42068) Increase the sample space for random inner hits name generator (elastic#42057) Recognise direct buffers in heap size docs (elastic#42070) shouldRollGeneration should execute under read lock (elastic#41696) Wait for active shard after close in mixed cluster (elastic#42029) ...
…elastic#41920) * [ML] adding pivot.size option for setting paging size * Changing field name to address PR comments * fixing ctor usage * adjust hlrc for field name change
This adds a
size
parameter to thepivot
configuration in a data frame transform.I put the parameter inside the
pivot
configuration and not in the root object, as it relates directly to how we are pivoting the data. Since we are unsure what the next transform looks like, I thought it best to restricted it instead of making it atransform
field.If it is
null
we still opt to use the internal default of500
.